Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system

نویسندگان

  • Kristina Doing-Harris
  • Yarden Livnat
  • Stéphane M. Meystre
چکیده

BACKGROUND We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and relationship extraction techniques in a low overhead modifiable system. Our SEmi-Automated ontology Maintenance (SEAM) system features a natural language processing pipeline for information extraction. Synonym and hierarchical groups are identified using corpus-based semantics and lexico-syntactic patterns. The semantic vectors we use are term frequency by inverse document frequency and context vectors. Clinical documents contain the terms we want in an ontology. They also contain idiosyncratic usage and are unlikely to contain the linguistic constructs associated with synonym and hierarchy identification. By including both clinical and biomedical texts, SEAM can recommend terms from those appearing in both document types. The set of recommended terms is then used to filter the synonyms and hierarchical relationships extracted from the biomedical corpus. We demonstrate the generality of the system across three use cases: ontologies for acute changes in mental status, Medically Unexplained Syndromes, and echocardiogram summary statements. RESULTS Across the three uses cases, we held the number of recommended terms relatively constant by changing SEAM's parameters. Experts seem to find more than 300 recommended terms to be overwhelming. The approval rate of recommended terms increased as the number and specificity of clinical documents in the corpus increased. It was 60% when there were 199 clinical documents that were not specific to the ontology domain and 90% when there were 2879 documents very specific to the target domain. We found that fewer than 100 recommended synonym groups were also preferred. Approval rates for synonym recommendations remained low varying from 43% to 25% as the number of journal articles increased from 19 to 47. Overall the number of recommended hierarchical relationships was very low although approval was good. It varied between 67% and 31%. CONCLUSION SEAM produced a concise list of recommended clinical terms, synonyms and hierarchical relationships regardless of medical domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Biomedical Semantics in the Big Data Era

1 Doing-Harris K, Livnat Y, Meystre S Automated concept and relationship extraction for the Semi-Automated Ontology Management (SEAM) System Journal of Biomedical Semantics 2015, 6:15 doi:10.1186/s13326 -015-0011-7 Ontology; Natural language processing; Terminology extraction Background: We develop medical-specialty specific ontologies that contain the settled science and common term usage. We ...

متن کامل

Cost Function Modelling for Semi-automated SC, RTG and Automated and Semi-automated RMG Container Yard Operating Systems

This study analyses the concept of cost functions for semi-automated Straddle Carrier (SC), Rubber Tyred Gantry (RTG) and automated Rail Mounted Gantry (RMG) container yard operating cranes. It develops a generic cost based model for a pair-wise comparison, analysis and evaluation of economic efficiency and effectiveness of container yard equipment to be used for decision-making by terminal pla...

متن کامل

Semi-quantitative segmental perfusion scoring in myocardial perfusion SPECT: visual vs. automated analysis

Introduction: It is recommended that the physician apply at least a semi-quantitative segmental scoring system in myocardial perfusion SPECT.  We aimed to assess the agreement between automated semi-quantitative analysis using QPS (quantitative Perfusion SPECT) software and visual approach for calculation of summed stress  score (SSS), summed rest score (SRS) and summed difference score (SDS). ...

متن کامل

A Framework for Ontology Life Cycle Management

This paper describes a method and automation approach for ontology life cycle management. First, the challenges associated with knowledge creation are summarized. The problems associated with ontology capture, analysis, and maintenance are used to motivate the need for ontology life cycle management. Then a semi-automated method for ontology life cycle management is described. The method is com...

متن کامل

انتیرانداک: سامانه یکپارچه توسعه مشارکتی هستاننگار فارسی

While ontology development is beneficial, it is very costly and time consuming. In order to reduce this cost as well as to increase the accuracy and quality of ontology development, researchers have proposed different methodologies. The goal of these methodologies is to present a systematic manual or semi-automated development of ontologies, while each differs and has its strengths and weakness...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2015